A Canonical Form of Arithmetic and Conditional Expressions

نویسندگان

  • Torsten Görg
  • Mandy Northover
چکیده

When implementing a software, developers express conceptual knowledge (e.g. about a specific feature) not only in program language syntax and semantics but also in linguistic information stored in identifiers (e.g. method or class names) [6]. Based on this habit, Natural Language Program Analysis (NLPA) is used to improve many different areas in software engineering such as code recommendations or program analysis [7]. Simplified, NLPA algorithms collect identifier names and apply term processing such as camel case splitting (i.e. “MyIdentifier” to “My” and “Identifier”) or stemming (i.e. “records” to “record”) to subsequently perform further analyzes [10]. In our research context, we search for code locations sharing similar terms to link them with each other. In such types of analysis, filtering stop words is essential to reduce the number of useless links. Just collecting, splitting, and stemming the identifier names, can result in a list of terms with divergent grade of usefulness. For example, the terms “get” and “set” are used in most Java application due to common coding practices and not to express any conceptual knowledge. These terms corrupt the program analysis leading to unreasonable findings. To reduce this noise, a typical approach in natural language processing is to filter terms known as useless (aka “stop words”). For natural languages, many stop word lists are publicly available. However, as Høst et al. [5] identified, developers use a more specific vocabulary than in general spoken language. So common stop word lists are not reasonable to be used in program analysis, they even depend on domain, application type, developing company, and project settings. In this paper, we propose an approach to develop reusable stop word lists to improve NLPA. We i) propose to distinguish different scopes a stop word list applies to (i.e. programming language, technology, and domain) and ii) recommend types of sources for terms to include. Our approach is closely related to the work of Ratiu [8], who recommended considering domain knowledge for program analysis in general. We propose a specific application of the concept as guidelines

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dead Code Detection On Class Level

This paper contributes to code clone detection by providing an algorithm that calculates canonical forms of arithmetic and conditional expressions. An experimental evaluation shows the relevance of such expressions in real code. The proposed normalization can be used in addition to dataflow normalizations.

متن کامل

A Note on the First Geometric-Arithmetic Index of Hexagonal Systems and Phenylenes

The first geometric-arithmetic index was introduced in the chemical theory as the summation of 2 du dv /(du  dv ) overall edges of the graph, where du stand for the degree of the vertex u. In this paper we give the expressions for computing the first geometric-arithmetic index of hexagonal systems and phenylenes and present new method for describing hexagonal system by corresponding a simple g...

متن کامل

CIESIELSKI, KALLA, ZENG: TAYLOR EXPANSIONDIAGRAMS: A COMPACT CANONICAL REPRESENTATIONFOR ARITHMETIC EXPRESSIONS 1 Taylor Expansion Diagrams: A Compact Canonical Representation for Arithmetic Expressions

This paper presents a new, compact, canonical representation for arithmetic expressions, called Taylor Expansion Diagram. It can be used to facilitate the verification of RTL specifications and hardware implementations of arithmetic designs, and specifically the equivalence checking of complex algebraic and arithmetic expressions that arise in symbolic verification. This new representation is b...

متن کامل

Representing Boolean Functions with If-Then-Else DAGs

This article describes the use of binary decision diagrams (BDDs) and if-then-else dags for representing and manipulating Boolean functions. Two-cuts are de ned for binary decision diagrams, and a relationship is exhibited between general if-then-else expressions and the two-cuts of a BDD for the same function. An algorithm for computing all two-cuts of a BDD in O(n 2 ) time is given. A new can...

متن کامل

Determining the order of minimal realization of descriptor systems without use of the Weierstrass canonical form

A common method to determine the order of minimal realization of a continuous linear time invariant descriptor system is to decompose it into slow and fast subsystems using the Weierstrass canonical form. The Weierstrass decomposition should be avoided because it is generally an ill-conditioned problem that requires many complex calculations especially for high-dimensional systems. The present ...

متن کامل

Three Datatype Defining Rewrite Systems for Datatypes of Integers each extending a Datatype of Naturals

Integer arithmetic is specified according to three views: unary, binary, and decimal notation. The binary and decimal view have as their characteristic that each normal form resembles common number notation, that is, either a digit, or a string of digits without leading zero, or the negated versions of the latter. The unary view comprises a specification of integer arithmetic based on 0, succes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Softwaretechnik-Trends

دوره 34  شماره 

صفحات  -

تاریخ انتشار 2014